Extracting String Features with Adaptation for Text Classification
نویسندگان
چکیده
منابع مشابه
Boosting for Text Classification with Semantic Features
Current text classification systems typically use term stems for representing document content. Ontologies allow the usage of features on a higher semantic level than single words for text classification purposes. In this paper we propose such an enhancement of the classical document representation through concepts extracted from background knowledge. Boosting, a successful machine learning tec...
متن کاملExtracting line string features from GPS logs
Geo data is an important foundation for any type of location-based service, but geo data often is expensive or distribution is limited by certain license restrictions. As a solution, open projects collect geo data from individuals and publish these data under a public licence. As a major drawback, the correction and integration of collected data is difficult and cost-intensive. This paper descr...
متن کاملString Subsequence Kernels for Text Classification
This paper explores the string subsequence kernel, a kernel function whose feature space is generated by subsequences of strings. This kernel compares two strings based on the number of occurrences of common substrings they contain, where each common substring is weighted based on how contiguous that substring is within the string. Although a recursive definition of the string subsequence kerne...
متن کاملText Classification using String Kernels
We propose a novel approach for categorizing text documents based on the use of a special kernel. The kernel is an inner product in the feature space generated by all subsequences of length k. A subsequence is any ordered sequence of k characters occurring in the text though not necessarily contiguously. The subsequences are weighted by an exponentially decaying factor of their full length in t...
متن کاملSample-oriented Domain Adaptation for Image Classification
Image processing is a method to perform some operations on an image, in order to get an enhanced image or to extract some useful information from it. The conventional image processing algorithms cannot perform well in scenarios where the training images (source domain) that are used to learn the model have a different distribution with test images (target domain). Also, many real world applicat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Natural Language Processing
سال: 2010
ISSN: 1340-7619
DOI: 10.5715/jnlp.17.1_77